Overview

Dataset statistics

Number of variables12
Number of observations33045
Missing cells0
Missing cells (%)0.0%
Duplicate rows64
Duplicate rows (%)0.2%
Total size in memory3.0 MiB
Average record size in memory96.0 B

Variable types

Categorical9
Numeric3

Alerts

Dataset has 64 (0.2%) duplicate rowsDuplicates
Date has a high cardinality: 1697 distinct valuesHigh cardinality
Products has a high cardinality: 105 distinct valuesHigh cardinality
Quantity is highly overall correlated with ValueHigh correlation
Rate is highly overall correlated with FY and 1 other fieldsHigh correlation
Value is highly overall correlated with QuantityHigh correlation
FY is highly overall correlated with Rate and 1 other fieldsHigh correlation
dia is highly overall correlated with dia groupHigh correlation
dia group is highly overall correlated with diaHigh correlation
grade is highly overall correlated with Rate and 1 other fieldsHigh correlation
type is highly overall correlated with lengthHigh correlation
length is highly overall correlated with typeHigh correlation
type is highly imbalanced (77.2%)Imbalance
length is highly imbalanced (82.4%)Imbalance
Voucher Type is highly imbalanced (85.4%)Imbalance

Reproduction

Analysis started2023-08-13 14:07:51.392828
Analysis finished2023-08-13 14:08:11.408152
Duration20.02 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

Date
Categorical

Distinct1697
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Memory size258.3 KiB
01-09-2020
 
62
12/30/2021
 
62
01-10-2019
 
59
8/30/2019
 
58
8/30/2018
 
56
Other values (1692)
32748 

Length

Max length10
Median length10
Mean length9.5063701
Min length9

Characters and Unicode

Total characters314138
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)< 0.1%

Sample

1st row04-03-2017
2nd row04-03-2017
3rd row04-03-2017
4th row04-03-2017
5th row04-03-2017

Common Values

ValueCountFrequency (%)
01-09-2020 62
 
0.2%
12/30/2021 62
 
0.2%
01-10-2019 59
 
0.2%
8/30/2019 58
 
0.2%
8/30/2018 56
 
0.2%
3/15/2019 55
 
0.2%
3/31/2019 55
 
0.2%
7/13/2019 54
 
0.2%
12-09-2019 54
 
0.2%
7/30/2019 53
 
0.2%
Other values (1687) 32477
98.3%

Length

2023-08-13T14:08:11.670815image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
01-09-2020 62
 
0.2%
12/30/2021 62
 
0.2%
01-10-2019 59
 
0.2%
8/30/2019 58
 
0.2%
8/30/2018 56
 
0.2%
3/15/2019 55
 
0.2%
3/31/2019 55
 
0.2%
7/13/2019 54
 
0.2%
12-09-2019 54
 
0.2%
7/30/2019 53
 
0.2%
Other values (1687) 32477
98.3%

Most occurring characters

ValueCountFrequency (%)
2 75436
24.0%
0 61046
19.4%
1 50261
16.0%
/ 43182
13.7%
- 22908
 
7.3%
9 13123
 
4.2%
8 12418
 
4.0%
7 10368
 
3.3%
3 8965
 
2.9%
6 5748
 
1.8%
Other values (2) 10683
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 248048
79.0%
Other Punctuation 43182
 
13.7%
Dash Punctuation 22908
 
7.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 75436
30.4%
0 61046
24.6%
1 50261
20.3%
9 13123
 
5.3%
8 12418
 
5.0%
7 10368
 
4.2%
3 8965
 
3.6%
6 5748
 
2.3%
5 5590
 
2.3%
4 5093
 
2.1%
Other Punctuation
ValueCountFrequency (%)
/ 43182
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 22908
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 314138
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 75436
24.0%
0 61046
19.4%
1 50261
16.0%
/ 43182
13.7%
- 22908
 
7.3%
9 13123
 
4.2%
8 12418
 
4.0%
7 10368
 
3.3%
3 8965
 
2.9%
6 5748
 
1.8%
Other values (2) 10683
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 314138
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 75436
24.0%
0 61046
19.4%
1 50261
16.0%
/ 43182
13.7%
- 22908
 
7.3%
9 13123
 
4.2%
8 12418
 
4.0%
7 10368
 
3.3%
3 8965
 
2.9%
6 5748
 
1.8%
Other values (2) 10683
 
3.4%

FY
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size258.3 KiB
FY 20
6402 
FY 19
6251 
FY 22
5633 
FY 18
5157 
FY 23
4876 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters165225
Distinct characters9
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFY 18
2nd rowFY 18
3rd rowFY 18
4th rowFY 18
5th rowFY 18

Common Values

ValueCountFrequency (%)
FY 20 6402
19.4%
FY 19 6251
18.9%
FY 22 5633
17.0%
FY 18 5157
15.6%
FY 23 4876
14.8%
FY 21 4726
14.3%

Length

2023-08-13T14:08:12.150805image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-13T14:08:12.584015image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
fy 33045
50.0%
20 6402
 
9.7%
19 6251
 
9.5%
22 5633
 
8.5%
18 5157
 
7.8%
23 4876
 
7.4%
21 4726
 
7.2%

Most occurring characters

ValueCountFrequency (%)
F 33045
20.0%
Y 33045
20.0%
33045
20.0%
2 27270
16.5%
1 16134
9.8%
0 6402
 
3.9%
9 6251
 
3.8%
8 5157
 
3.1%
3 4876
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 66090
40.0%
Decimal Number 66090
40.0%
Space Separator 33045
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 27270
41.3%
1 16134
24.4%
0 6402
 
9.7%
9 6251
 
9.5%
8 5157
 
7.8%
3 4876
 
7.4%
Uppercase Letter
ValueCountFrequency (%)
F 33045
50.0%
Y 33045
50.0%
Space Separator
ValueCountFrequency (%)
33045
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 99135
60.0%
Latin 66090
40.0%

Most frequent character per script

Common
ValueCountFrequency (%)
33045
33.3%
2 27270
27.5%
1 16134
16.3%
0 6402
 
6.5%
9 6251
 
6.3%
8 5157
 
5.2%
3 4876
 
4.9%
Latin
ValueCountFrequency (%)
F 33045
50.0%
Y 33045
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 165225
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 33045
20.0%
Y 33045
20.0%
33045
20.0%
2 27270
16.5%
1 16134
9.8%
0 6402
 
3.9%
9 6251
 
3.8%
8 5157
 
3.1%
3 4876
 
3.0%

Products
Categorical

Distinct105
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size258.3 KiB
08MM TATA TISCON FE500D (S)
3866 
16MM TATA TISCON FE500D (S)
3532 
12MM TATA TISCON FE500D (S)
3486 
10MM TATA TISCON FE500D (S)
3088 
16MM TATA TISCON FE500D (T)
1837 
Other values (100)
17236 

Length

Max length44
Median length27
Mean length27.865154
Min length9

Characters and Unicode

Total characters920804
Distinct characters35
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)< 0.1%

Sample

1st row25MM TATA TISCON FE500D (S)
2nd row08MM TATA TISCON FE500D (S)
3rd row10MM TATA TISCON FE500D (S)
4th row12MM TATA TISCON FE500D (S)
5th row16MM TATA TISCON FE500D (S)

Common Values

ValueCountFrequency (%)
08MM TATA TISCON FE500D (S) 3866
11.7%
16MM TATA TISCON FE500D (S) 3532
 
10.7%
12MM TATA TISCON FE500D (S) 3486
 
10.5%
10MM TATA TISCON FE500D (S) 3088
 
9.3%
16MM TATA TISCON FE500D (T) 1837
 
5.6%
20MM TATA TISCON FE500D (S) 1774
 
5.4%
08MM TATA TISCON FE500D (T) 1748
 
5.3%
10MM TATA TISCON FE500D (T) 1664
 
5.0%
12MM TATA TISCON FE500D (T) 1571
 
4.8%
20MM TATA TISCON FE500D (T) 1285
 
3.9%
Other values (95) 9194
27.8%

Length

2023-08-13T14:08:12.918033image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tiscon 31860
18.9%
tata 31727
18.8%
fe500d 27367
16.2%
s 17522
10.4%
t 14224
8.4%
08mm 6898
 
4.1%
16mm 6483
 
3.8%
12mm 6284
 
3.7%
10mm 5634
 
3.3%
fe550d 4577
 
2.7%
Other values (53) 16012
9.5%

Most occurring characters

ValueCountFrequency (%)
136210
14.8%
T 115582
12.6%
0 78305
 
8.5%
M 66146
 
7.2%
A 63456
 
6.9%
S 54759
 
5.9%
5 39580
 
4.3%
I 36184
 
3.9%
C 35277
 
3.8%
O 34908
 
3.8%
Other values (25) 260397
28.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 542799
58.9%
Decimal Number 170544
 
18.5%
Space Separator 136210
 
14.8%
Close Punctuation 32726
 
3.6%
Open Punctuation 32726
 
3.6%
Lowercase Letter 3057
 
0.3%
Dash Punctuation 1499
 
0.2%
Other Punctuation 1243
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 115582
21.3%
M 66146
12.2%
A 63456
11.7%
S 54759
10.1%
I 36184
 
6.7%
C 35277
 
6.5%
O 34908
 
6.4%
E 33999
 
6.3%
D 33913
 
6.2%
N 33003
 
6.1%
Other values (6) 35572
 
6.6%
Decimal Number
ValueCountFrequency (%)
0 78305
45.9%
5 39580
23.2%
1 20750
 
12.2%
2 12862
 
7.5%
8 8868
 
5.2%
6 8015
 
4.7%
7 1545
 
0.9%
3 527
 
0.3%
4 92
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 32721
> 99.9%
] 5
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 32721
> 99.9%
[ 5
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
m 2763
90.4%
e 294
 
9.6%
Other Punctuation
ValueCountFrequency (%)
: 1232
99.1%
. 11
 
0.9%
Space Separator
ValueCountFrequency (%)
136210
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1499
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 545856
59.3%
Common 374948
40.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 115582
21.2%
M 66146
12.1%
A 63456
11.6%
S 54759
10.0%
I 36184
 
6.6%
C 35277
 
6.5%
O 34908
 
6.4%
E 33999
 
6.2%
D 33913
 
6.2%
N 33003
 
6.0%
Other values (8) 38629
 
7.1%
Common
ValueCountFrequency (%)
136210
36.3%
0 78305
20.9%
5 39580
 
10.6%
) 32721
 
8.7%
( 32721
 
8.7%
1 20750
 
5.5%
2 12862
 
3.4%
8 8868
 
2.4%
6 8015
 
2.1%
7 1545
 
0.4%
Other values (7) 3371
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 920804
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
136210
14.8%
T 115582
12.6%
0 78305
 
8.5%
M 66146
 
7.2%
A 63456
 
6.9%
S 54759
 
5.9%
5 39580
 
4.3%
I 36184
 
3.9%
C 35277
 
3.8%
O 34908
 
3.8%
Other values (25) 260397
28.3%

dia
Categorical

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size258.3 KiB
08 MM
7508 
16 MM
6735 
12 MM
6575 
10 MM
5953 
20 MM
3695 
Other values (5)
2579 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters165225
Distinct characters9
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row25 MM
2nd row08 MM
3rd row10 MM
4th row12 MM
5th row16 MM

Common Values

ValueCountFrequency (%)
08 MM 7508
22.7%
16 MM 6735
20.4%
12 MM 6575
19.9%
10 MM 5953
18.0%
20 MM 3695
11.2%
25 MM 1979
 
6.0%
32 MM 525
 
1.6%
28 MM 71
 
0.2%
06 MM 2
 
< 0.1%
36 MM 2
 
< 0.1%

Length

2023-08-13T14:08:13.225028image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-13T14:08:13.560339image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
mm 33045
50.0%
08 7508
 
11.4%
16 6735
 
10.2%
12 6575
 
9.9%
10 5953
 
9.0%
20 3695
 
5.6%
25 1979
 
3.0%
32 525
 
0.8%
28 71
 
0.1%
06 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
M 66090
40.0%
33045
20.0%
1 19263
 
11.7%
0 17158
 
10.4%
2 12845
 
7.8%
8 7579
 
4.6%
6 6739
 
4.1%
5 1979
 
1.2%
3 527
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 66090
40.0%
Decimal Number 66090
40.0%
Space Separator 33045
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 19263
29.1%
0 17158
26.0%
2 12845
19.4%
8 7579
 
11.5%
6 6739
 
10.2%
5 1979
 
3.0%
3 527
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
M 66090
100.0%
Space Separator
ValueCountFrequency (%)
33045
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 99135
60.0%
Latin 66090
40.0%

Most frequent character per script

Common
ValueCountFrequency (%)
33045
33.3%
1 19263
19.4%
0 17158
17.3%
2 12845
 
13.0%
8 7579
 
7.6%
6 6739
 
6.8%
5 1979
 
2.0%
3 527
 
0.5%
Latin
ValueCountFrequency (%)
M 66090
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 165225
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M 66090
40.0%
33045
20.0%
1 19263
 
11.7%
0 17158
 
10.4%
2 12845
 
7.8%
8 7579
 
4.6%
6 6739
 
4.1%
5 1979
 
1.2%
3 527
 
0.3%

dia group
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size258.3 KiB
12 MM - 32 MM
19542 
08 MM
7508 
10 MM
5953 
28 MM
 
40
06 MM
 
2

Length

Max length13
Median length13
Mean length9.7310032
Min length5

Characters and Unicode

Total characters321561
Distinct characters9
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row12 MM - 32 MM
2nd row08 MM
3rd row10 MM
4th row12 MM - 32 MM
5th row12 MM - 32 MM

Common Values

ValueCountFrequency (%)
12 MM - 32 MM 19542
59.1%
08 MM 7508
 
22.7%
10 MM 5953
 
18.0%
28 MM 40
 
0.1%
06 MM 2
 
< 0.1%

Length

2023-08-13T14:08:13.885602image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-13T14:08:14.182095image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
mm 52587
42.2%
12 19542
 
15.7%
19542
 
15.7%
32 19542
 
15.7%
08 7508
 
6.0%
10 5953
 
4.8%
28 40
 
< 0.1%
06 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
M 105174
32.7%
91671
28.5%
2 39124
 
12.2%
1 25495
 
7.9%
- 19542
 
6.1%
3 19542
 
6.1%
0 13463
 
4.2%
8 7548
 
2.3%
6 2
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 105174
32.7%
Decimal Number 105174
32.7%
Space Separator 91671
28.5%
Dash Punctuation 19542
 
6.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 39124
37.2%
1 25495
24.2%
3 19542
18.6%
0 13463
 
12.8%
8 7548
 
7.2%
6 2
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
M 105174
100.0%
Space Separator
ValueCountFrequency (%)
91671
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 19542
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 216387
67.3%
Latin 105174
32.7%

Most frequent character per script

Common
ValueCountFrequency (%)
91671
42.4%
2 39124
18.1%
1 25495
 
11.8%
- 19542
 
9.0%
3 19542
 
9.0%
0 13463
 
6.2%
8 7548
 
3.5%
6 2
 
< 0.1%
Latin
ValueCountFrequency (%)
M 105174
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 321561
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M 105174
32.7%
91671
28.5%
2 39124
 
12.2%
1 25495
 
7.9%
- 19542
 
6.1%
3 19542
 
6.1%
0 13463
 
4.2%
8 7548
 
2.3%
6 2
 
< 0.1%

grade
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size258.3 KiB
500D
28460 
550D
4585 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters132180
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row500D
2nd row500D
3rd row500D
4th row500D
5th row500D

Common Values

ValueCountFrequency (%)
500D 28460
86.1%
550D 4585
 
13.9%

Length

2023-08-13T14:08:15.205822image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-13T14:08:15.448858image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
500d 28460
86.1%
550d 4585
 
13.9%

Most occurring characters

ValueCountFrequency (%)
0 61505
46.5%
5 37630
28.5%
D 33045
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 99135
75.0%
Uppercase Letter 33045
 
25.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 61505
62.0%
5 37630
38.0%
Uppercase Letter
ValueCountFrequency (%)
D 33045
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 99135
75.0%
Latin 33045
 
25.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 61505
62.0%
5 37630
38.0%
Latin
ValueCountFrequency (%)
D 33045
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 132180
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 61505
46.5%
5 37630
28.5%
D 33045
25.0%

type
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size258.3 KiB
FULL LENGTH
30477 
CAB
 
1290
COIL
 
615
CRSD
 
369
SHORT LENGTH
 
294

Length

Max length12
Median length11
Mean length10.488153
Min length3

Characters and Unicode

Total characters346581
Distinct characters17
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFULL LENGTH
2nd rowFULL LENGTH
3rd rowFULL LENGTH
4th rowFULL LENGTH
5th rowFULL LENGTH

Common Values

ValueCountFrequency (%)
FULL LENGTH 30477
92.2%
CAB 1290
 
3.9%
COIL 615
 
1.9%
CRSD 369
 
1.1%
SHORT LENGTH 294
 
0.9%

Length

2023-08-13T14:08:15.687535image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-13T14:08:15.987251image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
length 30771
48.2%
full 30477
47.8%
cab 1290
 
2.0%
coil 615
 
1.0%
crsd 369
 
0.6%
short 294
 
0.5%

Most occurring characters

ValueCountFrequency (%)
L 92340
26.6%
H 31065
 
9.0%
T 31065
 
9.0%
30771
 
8.9%
E 30771
 
8.9%
N 30771
 
8.9%
G 30771
 
8.9%
U 30477
 
8.8%
F 30477
 
8.8%
C 2274
 
0.7%
Other values (7) 5799
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 315810
91.1%
Space Separator 30771
 
8.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 92340
29.2%
H 31065
 
9.8%
T 31065
 
9.8%
E 30771
 
9.7%
N 30771
 
9.7%
G 30771
 
9.7%
U 30477
 
9.7%
F 30477
 
9.7%
C 2274
 
0.7%
A 1290
 
0.4%
Other values (6) 4509
 
1.4%
Space Separator
ValueCountFrequency (%)
30771
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 315810
91.1%
Common 30771
 
8.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 92340
29.2%
H 31065
 
9.8%
T 31065
 
9.8%
E 30771
 
9.7%
N 30771
 
9.7%
G 30771
 
9.7%
U 30477
 
9.7%
F 30477
 
9.7%
C 2274
 
0.7%
A 1290
 
0.4%
Other values (6) 4509
 
1.4%
Common
ValueCountFrequency (%)
30771
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 346581
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 92340
26.6%
H 31065
 
9.0%
T 31065
 
9.0%
30771
 
8.9%
E 30771
 
8.9%
N 30771
 
8.9%
G 30771
 
8.9%
U 30477
 
8.8%
F 30477
 
8.8%
C 2274
 
0.7%
Other values (7) 5799
 
1.7%

length
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size258.3 KiB
12 METER
30846 
CUSTOMISED
 
1290
0 METER
 
615
7 - 10 METER
 
177
4 - 7 METER
 
92

Length

Max length13
Median length8
Mean length8.0930247
Min length7

Characters and Unicode

Total characters267434
Distinct characters17
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row12 METER
2nd row12 METER
3rd row12 METER
4th row12 METER
5th row12 METER

Common Values

ValueCountFrequency (%)
12 METER 30846
93.3%
CUSTOMISED 1290
 
3.9%
0 METER 615
 
1.9%
7 - 10 METER 177
 
0.5%
4 - 7 METER 92
 
0.3%
10 - 12 METER 25
 
0.1%

Length

2023-08-13T14:08:16.243326image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-13T14:08:16.563617image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
meter 31755
48.6%
12 30871
47.2%
customised 1290
 
2.0%
0 615
 
0.9%
294
 
0.4%
7 269
 
0.4%
10 202
 
0.3%
4 92
 
0.1%

Most occurring characters

ValueCountFrequency (%)
E 64800
24.2%
M 33045
12.4%
T 33045
12.4%
32343
12.1%
R 31755
11.9%
1 31073
11.6%
2 30871
11.5%
S 2580
 
1.0%
I 1290
 
0.5%
D 1290
 
0.5%
Other values (7) 5342
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 171675
64.2%
Decimal Number 63122
 
23.6%
Space Separator 32343
 
12.1%
Dash Punctuation 294
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 64800
37.7%
M 33045
19.2%
T 33045
19.2%
R 31755
18.5%
S 2580
 
1.5%
I 1290
 
0.8%
D 1290
 
0.8%
U 1290
 
0.8%
O 1290
 
0.8%
C 1290
 
0.8%
Decimal Number
ValueCountFrequency (%)
1 31073
49.2%
2 30871
48.9%
0 817
 
1.3%
7 269
 
0.4%
4 92
 
0.1%
Space Separator
ValueCountFrequency (%)
32343
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 294
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 171675
64.2%
Common 95759
35.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 64800
37.7%
M 33045
19.2%
T 33045
19.2%
R 31755
18.5%
S 2580
 
1.5%
I 1290
 
0.8%
D 1290
 
0.8%
U 1290
 
0.8%
O 1290
 
0.8%
C 1290
 
0.8%
Common
ValueCountFrequency (%)
32343
33.8%
1 31073
32.4%
2 30871
32.2%
0 817
 
0.9%
- 294
 
0.3%
7 269
 
0.3%
4 92
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 267434
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 64800
24.2%
M 33045
12.4%
T 33045
12.4%
32343
12.1%
R 31755
11.9%
1 31073
11.6%
2 30871
11.5%
S 2580
 
1.0%
I 1290
 
0.5%
D 1290
 
0.5%
Other values (7) 5342
 
2.0%

Voucher Type
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size258.3 KiB
Sales A/c GST
31578 
Sales
 
1286
Sales(Customised)
 
99
Credit Note
 
82

Length

Max length17
Median length13
Mean length12.695688
Min length5

Characters and Unicode

Total characters419529
Distinct characters22
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSales
2nd rowSales
3rd rowSales
4th rowSales
5th rowSales

Common Values

ValueCountFrequency (%)
Sales A/c GST 31578
95.6%
Sales 1286
 
3.9%
Sales(Customised) 99
 
0.3%
Credit Note 82
 
0.2%

Length

2023-08-13T14:08:16.828409image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-13T14:08:17.105649image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
sales 32864
34.1%
a/c 31578
32.8%
gst 31578
32.8%
sales(customised 99
 
0.1%
credit 82
 
0.1%
note 82
 
0.1%

Most occurring characters

ValueCountFrequency (%)
S 64541
15.4%
63238
15.1%
e 33226
7.9%
s 33161
7.9%
l 32963
7.9%
a 32963
7.9%
G 31578
7.5%
T 31578
7.5%
c 31578
7.5%
/ 31578
7.5%
Other values (12) 33125
7.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 164977
39.3%
Uppercase Letter 159538
38.0%
Space Separator 63238
 
15.1%
Other Punctuation 31578
 
7.5%
Close Punctuation 99
 
< 0.1%
Open Punctuation 99
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 33226
20.1%
s 33161
20.1%
l 32963
20.0%
a 32963
20.0%
c 31578
19.1%
t 263
 
0.2%
i 181
 
0.1%
o 181
 
0.1%
d 181
 
0.1%
m 99
 
0.1%
Other values (2) 181
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
S 64541
40.5%
G 31578
19.8%
T 31578
19.8%
A 31578
19.8%
C 181
 
0.1%
N 82
 
0.1%
Space Separator
ValueCountFrequency (%)
63238
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 31578
100.0%
Close Punctuation
ValueCountFrequency (%)
) 99
100.0%
Open Punctuation
ValueCountFrequency (%)
( 99
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 324515
77.4%
Common 95014
 
22.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 64541
19.9%
e 33226
10.2%
s 33161
10.2%
l 32963
10.2%
a 32963
10.2%
G 31578
9.7%
T 31578
9.7%
c 31578
9.7%
A 31578
9.7%
t 263
 
0.1%
Other values (8) 1086
 
0.3%
Common
ValueCountFrequency (%)
63238
66.6%
/ 31578
33.2%
) 99
 
0.1%
( 99
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 419529
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 64541
15.4%
63238
15.1%
e 33226
7.9%
s 33161
7.9%
l 32963
7.9%
a 32963
7.9%
G 31578
7.5%
T 31578
7.5%
c 31578
7.5%
/ 31578
7.5%
Other values (12) 33125
7.9%

Quantity
Real number (ℝ)

Distinct3751
Distinct (%)11.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.9213287
Minimum-32.34
Maximum41.68
Zeros0
Zeros (%)0.0%
Negative81
Negative (%)0.2%
Memory size258.3 KiB
2023-08-13T14:08:17.375299image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-32.34
5-th percentile0.5
Q11.99
median3.9
Q37.01
95-th percentile23.358
Maximum41.68
Range74.02
Interquartile range (IQR)5.02

Descriptive statistics

Standard deviation6.6687238
Coefficient of variation (CV)1.1262208
Kurtosis5.0029735
Mean5.9213287
Median Absolute Deviation (MAD)2.145
Skewness2.202827
Sum195670.31
Variance44.471877
MonotonicityNot monotonic
2023-08-13T14:08:17.710386image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2 360
 
1.1%
2.02 353
 
1.1%
2.01 327
 
1.0%
1 320
 
1.0%
1.01 297
 
0.9%
1.99 261
 
0.8%
2.03 255
 
0.8%
3 241
 
0.7%
3.01 206
 
0.6%
1.05 186
 
0.6%
Other values (3741) 30239
91.5%
ValueCountFrequency (%)
-32.34 1
< 0.1%
-25.57 1
< 0.1%
-21.76 1
< 0.1%
-20.04 1
< 0.1%
-18.65 1
< 0.1%
-18.2 1
< 0.1%
-17.77 1
< 0.1%
-16.45 1
< 0.1%
-16.07 1
< 0.1%
-15.3 1
< 0.1%
ValueCountFrequency (%)
41.68 1
< 0.1%
41.46 1
< 0.1%
39.388 1
< 0.1%
39.38 1
< 0.1%
39.06 1
< 0.1%
39.03 1
< 0.1%
35.26 1
< 0.1%
35.03 1
< 0.1%
34.73 1
< 0.1%
34.646 1
< 0.1%

Rate
Real number (ℝ)

Distinct1127
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48518.211
Minimum19590
Maximum83000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size258.3 KiB
2023-08-13T14:08:18.031548image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum19590
5-th percentile36000
Q141750
median45700
Q356000
95-th percentile65000
Maximum83000
Range63410
Interquartile range (IQR)14250

Descriptive statistics

Standard deviation9640.5077
Coefficient of variation (CV)0.19869875
Kurtosis-0.080464253
Mean48518.211
Median Absolute Deviation (MAD)6200
Skewness0.70465357
Sum1.6032843 × 109
Variance92939389
MonotonicityNot monotonic
2023-08-13T14:08:18.330541image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
44000 961
 
2.9%
43000 845
 
2.6%
43500 732
 
2.2%
42500 678
 
2.1%
46000 677
 
2.0%
42000 668
 
2.0%
48000 647
 
2.0%
44500 638
 
1.9%
45000 580
 
1.8%
45500 543
 
1.6%
Other values (1117) 26076
78.9%
ValueCountFrequency (%)
19590 3
 
< 0.1%
21500 9
< 0.1%
21542 3
 
< 0.1%
21590 1
 
< 0.1%
22500 3
 
< 0.1%
30200 1
 
< 0.1%
30250 1
 
< 0.1%
30600 2
 
< 0.1%
30900 3
 
< 0.1%
31000 1
 
< 0.1%
ValueCountFrequency (%)
83000 1
 
< 0.1%
82500 3
 
< 0.1%
82000 2
 
< 0.1%
81500 3
 
< 0.1%
81100 1
 
< 0.1%
81000 3
 
< 0.1%
80500 14
< 0.1%
80460 1
 
< 0.1%
80352.54 1
 
< 0.1%
80000 16
< 0.1%

Value
Real number (ℝ)

Distinct23853
Distinct (%)72.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean283942.72
Minimum-2126355
Maximum2611980
Zeros0
Zeros (%)0.0%
Negative82
Negative (%)0.2%
Memory size258.3 KiB
2023-08-13T14:08:18.663435image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-2126355
5-th percentile24243.6
Q188140
median180147
Q3325500
95-th percentile1018026
Maximum2611980
Range4738335
Interquartile range (IQR)237360

Descriptive statistics

Standard deviation331266.04
Coefficient of variation (CV)1.1666651
Kurtosis8.1789313
Mean283942.72
Median Absolute Deviation (MAD)106392
Skewness2.5803349
Sum9.382887 × 109
Variance1.0973719 × 1011
MonotonicityNot monotonic
2023-08-13T14:08:18.955147image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
126000 22
 
0.1%
42000 15
 
< 0.1%
85570 15
 
< 0.1%
63000 14
 
< 0.1%
86860 14
 
< 0.1%
84840 14
 
< 0.1%
52400 14
 
< 0.1%
45000 13
 
< 0.1%
85850 12
 
< 0.1%
87000 12
 
< 0.1%
Other values (23843) 32900
99.6%
ValueCountFrequency (%)
-2126355 1
< 0.1%
-1305600 1
< 0.1%
-1182360 1
< 0.1%
-1115978.87 1
< 0.1%
-823500 1
< 0.1%
-819570 1
< 0.1%
-780300 1
< 0.1%
-757120 1
< 0.1%
-719687.5 1
< 0.1%
-653458.75 1
< 0.1%
ValueCountFrequency (%)
2611980 1
< 0.1%
2603604.5 1
< 0.1%
2587200 1
< 0.1%
2553115 1
< 0.1%
2510820 1
< 0.1%
2488682.5 1
< 0.1%
2430890 1
< 0.1%
2426562.5 1
< 0.1%
2422270.24 1
< 0.1%
2373592.5 1
< 0.1%

Interactions

2023-08-13T14:08:01.726339image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-13T14:07:57.390802image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-13T14:08:00.491276image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-13T14:08:04.885438image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-13T14:07:57.996348image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-13T14:08:00.673201image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-13T14:08:06.040774image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-13T14:07:58.284772image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-13T14:08:00.884017image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-08-13T14:08:19.197167image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
QuantityRateValueFYdiadia groupgradetypelengthVoucher Type
Quantity1.000-0.0770.9780.0550.0710.0590.0210.2060.0640.410
Rate-0.0771.0000.1050.5380.0650.0930.5010.1830.1560.165
Value0.9780.1051.0000.0950.0790.0700.0960.2470.0490.327
FY0.0550.5380.0951.0000.0350.0300.5070.1410.1130.281
dia0.0710.0650.0790.0351.0000.9440.0630.0810.0690.005
dia group0.0590.0930.0700.0300.9441.0000.0210.0650.0660.010
grade0.0210.5010.0960.5070.0630.0211.0000.1110.1020.084
type0.2060.1830.2470.1410.0810.0650.1111.0000.8660.078
length0.0640.1560.0490.1130.0690.0660.1020.8661.0000.086
Voucher Type0.4100.1650.3270.2810.0050.0100.0840.0780.0861.000

Missing values

2023-08-13T14:08:09.961213image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-08-13T14:08:10.942714image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

DateFYProductsdiadia groupgradetypelengthVoucher TypeQuantityRateValue
004-03-2017FY 1825MM TATA TISCON FE500D (S)25 MM12 MM - 32 MM500DFULL LENGTH12 METERSales25.4940000.001019600.00
104-03-2017FY 1808MM TATA TISCON FE500D (S)08 MM08 MM500DFULL LENGTH12 METERSales9.0943200.00392688.00
204-03-2017FY 1810MM TATA TISCON FE500D (S)10 MM10 MM500DFULL LENGTH12 METERSales3.9441700.00164298.00
304-03-2017FY 1812MM TATA TISCON FE500D (S)12 MM12 MM - 32 MM500DFULL LENGTH12 METERSales1.9341200.0079516.00
404-03-2017FY 1816MM TATA TISCON FE500D (S)16 MM12 MM - 32 MM500DFULL LENGTH12 METERSales1.0341200.0042436.00
504-03-2017FY 1825MM TATA TISCON FE500D (S)25 MM12 MM - 32 MM500DFULL LENGTH12 METERSales24.9140000.00996400.00
604-04-2017FY 1808MM TATA TISCON FE500D (S)08 MM08 MM500DFULL LENGTH12 METERSales6.7039976.19267840.47
704-04-2017FY 1812MM TATA TISCON FE500D (S)12 MM12 MM - 32 MM500DFULL LENGTH12 METERSales15.2437976.19578757.14
804-05-2017FY 1816MM TATA TISCON FE500D (S)16 MM12 MM - 32 MM500DFULL LENGTH12 METERSales13.7240000.00548800.00
904-05-2017FY 1820MM TATA TISCON FE500D (S)20 MM12 MM - 32 MM500DFULL LENGTH12 METERSales12.7740000.00510800.00
DateFYProductsdiadia groupgradetypelengthVoucher TypeQuantityRateValue
330353/24/2023FY 2325MM TATA TISCON FE550D (T)25 MM12 MM - 32 MM550DFULL LENGTH12 METERSales A/c GST8.8760000.0532200.0
330363/25/2023FY 2308MM TATA TISCON FE550D (T)08 MM08 MM550DFULL LENGTH12 METERSales A/c GST7.5962000.0470580.0
330373/25/2023FY 2310MM TATA TISCON FE500D (T)10 MM10 MM500DFULL LENGTH12 METERSales A/c GST3.7261000.0226920.0
330383/25/2023FY 2312MM TATA TISCON FE500D (T)12 MM12 MM - 32 MM500DFULL LENGTH12 METERSales A/c GST8.6660000.0519600.0
330393/25/2023FY 2316MM TATA TISCON FE500D (T)16 MM12 MM - 32 MM500DFULL LENGTH12 METERSales A/c GST4.5760000.0274200.0
330403/25/2023FY 2320MM TATA TISCON FE500D (T)20 MM12 MM - 32 MM500DFULL LENGTH12 METERSales A/c GST4.0860000.0244800.0
330413/30/2023FY 2308MM TATA TISCON FE550D (T)08 MM08 MM550DFULL LENGTH12 METERCredit Note-1.0761000.0-65270.0
330423/30/2023FY 2310MM TATA TISCON FE550D (T)10 MM10 MM550DFULL LENGTH12 METERCredit Note-1.5060000.0-90000.0
330433/30/2023FY 2312MM TATA TISCON FE550D (T)12 MM12 MM - 32 MM550DFULL LENGTH12 METERCredit Note-20.0459000.0-1182360.0
330443/30/2023FY 2320MM TATA TISCON FE500D (T)20 MM12 MM - 32 MM500DFULL LENGTH12 METERCredit Note-4.6059000.0-271400.0

Duplicate rows

Most frequently occurring

DateFYProductsdiadia groupgradetypelengthVoucher TypeQuantityRateValue# duplicates
435/30/2018FY 19CUSTOMISED 08MM TATA TISCON FE500D (S)08 MM08 MM500DCABCUSTOMISEDSales A/c GST1.0052400.052400.03
527/30/2019FY 20CUSTOMISED 12MM TATA TISCON FE500D (S)12 MM12 MM - 32 MM500DCABCUSTOMISEDSales A/c GST0.3344200.014586.03
001-06-2022FY 22TISCON TMT COIL IS:1786 FE500 D 12 mm(T)12 MM12 MM - 32 MM500DCOIL0 METERSales A/c GST1.9957000.0113430.02
101-09-2020FY 2010MM TATA TISCON FE500D (S)10 MM10 MM500DFULL LENGTH12 METERSales A/c GST1.0339500.040685.02
201-09-2020FY 2016MM TATA TISCON FE500D (S)16 MM12 MM - 32 MM500DFULL LENGTH12 METERSales A/c GST2.0340200.081606.02
302-10-2022FY 2216MM TATA TISCON FE500D (T)16 MM12 MM - 32 MM500DFULL LENGTH12 METERSales A/c GST2.0361250.0124337.52
402-12-2019FY 1916MM TATA TISCON FE500D (S)16 MM12 MM - 32 MM500DFULL LENGTH12 METERSales A/c GST3.1043000.0133300.02
502-12-2020FY 2010MM TATA TISCON FE500D (S)10 MM10 MM500DFULL LENGTH12 METERSales A/c GST1.9544000.085800.02
604-07-2018FY 1908MM TATA TISCON FE500D (S)08 MM08 MM500DFULL LENGTH12 METERSales A/c GST6.4750000.0323500.02
705-03-2022FY 2308MM TATA TISCON FE550D (T)08 MM08 MM550DFULL LENGTH12 METERSales A/c GST1.9776000.0149720.02